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The ability to detect lying is an important skill. 
While the polygraph is the most common mechanical method used for lie 
detection, other electronic-based methods have also been developed. 
One such method, the analysis of voice stress patterns, is based on 
the assumption that lying is a stressful activity which reduces 
involuntary frequency modulations in the human voice. One variation 
of voice analysjis involves recording interviews and then transmitting 
the recordings through the telephone -o a second location where the 
voice is re-recorded, charted, and evaluated. Voice stress analyses 
were performed on 15 tape-recorded pre-employment interviews in both 
their original form and after they had been transmitted via telephone 
and re-recorded. Four expert voice stress examiners, blind to the 
telephone condition, reported less stress in the telephone charts 
than in the original charts. There was little relationship between 
the stress rating for the same charts in their original and telephone 
forms. Reliability estimates were low for both the original and 
telephone stress ratings. Summing over the stress ratings from 
individ. al questions and advanced training on the part of the 
examiners both appeared to improve the reliability estimates. The 
continued use of telephone recorded tapes as substitutes for the 
original tapes is highly questionable. In a^fJition, these results 
suggest that voice analysis rating^;, as they are currently used, do 
not iihow sufficient reliability to warrant their continued use as a 
selection procedure for employment. (NB) 
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Abstract 



Voice stress analyses were performed on tape recorded 
pre-employment interviews in both their original form and after 
they h »d been transmitt^ed via telephone and re-recorded. Expert 
voice stress examiners, blind to the telephone condition, 
reported less stress in the telephone charts than in the original 
'Charts. There was little relationship between the stress rating 
for the same charts in their original and telephone forms. 
Reliability estimates were low for both the original and 
telephone stress ratingb. Summing over the stress ratings from 
individual questions and advanced training on the part of the 
examiners both appeared to improve the reliability estimates. The 
continuea use of telephone recorded tapes as substitutes for the 
original tapes is highly questionable. In addition, these results 
suggest that voice analysis ratings, as they are currently used, 
do not show sufficient reliability to warrant their continued use 
as a selection procedure for employment. 
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Voice Stress Analysis - 3 
Voice Stress Analysis: 
Use of Telephone Recordings 
The ability to detect deception and lying is an important and 
much sought out skill. Lie detection plays a role in civil and 
criminal court cases (Kleinmuntz 4 Szucko, 1982 and Lykken, 
1984); industrial settings (Bell, 1981;Lykken, 1981; Sackett & 
Decker, 1979); and government settings (Mervis, 1983 and Saxe, 
Dougherty, 4 Cross, 1985). While the polygraph is the most 
common mechanical method in use for lie detection (Kleinmuntz 4 
Szucko, 1984; Sackett 4 Decker, 1979), other electronic-based 
methods have recently emerged- One such system involves the 
analysis of voice stress patterns, a technique which has been 
subjected to only limited study by psychologists (Sackett 4 
Decker, 1979). 

Voice analysis is based on the assumption that lying is a 
stressful activity which redu^tes involuntary frequency modula- 
tions in the human voice (Dektor Counterintelligence and 
Security, Inc. , [Dektor] 1971). Oektor claims that vocal modula- 
tions are detected, measured, and displayed by the voice analysis 
equipment and examiners can be trained to interpret these dis- 
plays. The use of voice analysis techniques for identification of 
lying is reported to have several advantages over polygraph 
procedures, including eliminating the need for direct physical 
hookups, the ability to use recordings taken without the know- 
ledge of the person, the versatility of being able to conduct the 
interview in almost any location (restricted only by being able 
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Voice Stress Analysis - 4 
to use a tape recording), and being able to transmit recordings 
of the interview over the telephone for evaluation elsewhere 
(Dektor, 1971). The increasing use of voice stress lie detection 
methods in government and industrv (Bell, 1981), and the poten- 
tial for abuse of voice stress analysis (Hollien, 1980) indicates 
the need to investigate the various claims being made concerning 
this new technique. 

While proponents (e.g.. Bell, 1981) have claimed that voice 
stress analysis techniques are at least as good as polygraphs in 
detecting deception, laboratory evidence is at this time equivo- 
cal. Horvath (1978) and Kubis (1973) both reported that voice 
stress analysis produced aporoximately chance level identifica- 
tion of lying in mock crime situations, while the polygraph 
equipment performed well beyond chance levels. Horvath also re- 
ported a correlation of .38 between the two voice stress exami- 
ners. In a follow up study Horvath (1979) put additional stress 
on the subjects by only awarding extra credit if the subjects 
were successful in either being caught or avoiding being caught 
lying. Once again the "hit rates" for voice stress testing were 
no better than chance but the correlation between the examiners 
was higher, r=.65. Both Bell (1981) and Heisse ( 1976) have 
asserted that the lack of positive findings is due to the 
generally non-risk nature of the experimental setting, and 
maintain that only in real world situations can voice stress 
equipment be tested fairly. A similar argument has been used to 
explain the negative results from laboratory studies of 



Voice Stress Analysis - 5 

polygraphs (Lykken, 1979). 

Attempts to improve the ecological validity of v/oice stress 
research have produced more positive results, but in less con- 
trolled situations. Kradz (1971) tape recorded 42 polygraph 
interviews with suspects or victims of actual crimes. Blin^ 
evaluations of the voice analysis charts agreed w th polygraph 
results in all but one case. In two separate voice analysis 
evaluations the examiners agreed perfectly. Kradz also reported 
that the final dispositions of tne cases were observed and col- 
laborated the results of the polygraph examinations. Heisse 
(1976) collected 53 voice analysis interviews acquired during 
actual criminal cr pre-employment investigations. The final dis- 
positions of these cases were known, usually through confessions. 
Two examiners blindly rated the voice stress data using a stan- 
dardized evaluation method developed by Heisse (1974). Heisse 
(1976) reported that 97% of the examiners* ratings were correct, 
and the interrater reliability was .96. He concluded that the use 
of standardized methods in a non-experimental environment pro- 
duced these very positive results. 

At a oiore basic level, attempts to demonstrate that voice 
analysis evaluations can detect stress have produced mixed re- 
sults. Lynch and Henry (1979) found voice stress evaluators 
unable to correctly identify responses to taboo versus no n- taboo 
words. However, VanDercar, Greaner, Hibler, Spielberger, and 
Bloch (1980) found that voice stress evaluation identified 
changes in state anxiety (State-Trait Anxiety Inventory, Spiel- 
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Voice Stress Analysis - 6 
berger, Gorsuch, 4 Lushene, 1970) only when the threat of shock 
or taboo noiis was high. They reported an ^'nterrater reliability 
for the four raters of .92. Brenner, Branscomb, and Scnwartz 
(197>') found that voice analysis of individuals varied as a 
function of the task di f f iculty associated with m.i enema tic prob- 
lems they were solving. These results suggest that the voice 
stress analysis technique may have some validity with high stress 
stimuli . 

While ;he above review of the research on voice stress 
analysis suggests some value for the technique, practitioners use 
the procedures for lie detection in a variety of situations in 
which our knowledge is sorely lacking. One of the areas where 
practice may have exceeded our understanding is in the use of the 
telephone for transmitting voice recordings of interviews. The 
use of telephone transmissions allows an interview to be con- 
ducted and recorded at one site, then the recording can be played 
through the telephone and re-recorded at another location where 
it can be charted by the equipment and evaluated by the exami- 
ners. Dektor (1971), the manufacturer of the Psychological Stress 
Evaluator (P.S.E.), claims that their device works as well using 
telephone recordings as it does with the original recording. 
There are several reasons for suggesting that this claim may not 
be correct. The P.S.E. is believed to detect frequency modula- 
tions in the 3-lA HZ region while the telephone transmits 
frequencies in the 300-3300 HZ range. Further, there is the 
potential for a variety of noise to be introduced into the 
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recordings by the transmission process. These problems have been 
previously identified by Hollien (1980) but no attempts to inves- 
tigate this issue were found. In addition to the telephone trans- 
mission concerns, there is a need to provide further work on the 
reliability and validity of the voice stress analysis approach. 

The primary purpose of this study was to determine if tele- 
phone recorded tapes of pre-employment interviews and non-tele- 
phone recordings of the same interviews were evaluated in a 
similar way. Reliability will be determined for each method and 
for different types of questions. 

Method 

Subjects 

Tape recorded interviews were selected from the files of a 
midwest security consulting company which routinely conducts both 
in-person and telephone-transmitted pre-employment P.S.E. inter- 
views. The 15 subjects whose records were used had applied for 
sales positions with the same retail organization. Each of the 
subjects had undergone an in-person P.S.E. examination and each 
had been judged deceptive by the original examiner. 

Four professional P.S.E. examiners agreed to rate the P.S.E. 
charts (the tracings of the frequency modulations). Although the 
qualifications of the raters varied, all four had completed an 
approved training program in P.S.E. chart interpretation and had 
field experience in chart interpretation. The raters rec^rjived the 
charts in the mail and returned them by mail after maKing their 
evaluations . 
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Equipment 

The original interviews of the suojects were recorded using 
a Uher 4000IC reel to reel tape recorder. These recorded inter- 
/lews were charted using a P.S.E- model 101, For the telephone 
charts the following procedurt-.s were followed. The original tapes 
of the interview were transmitted to the security company over 
the telephone. While the manufacture of the P.S.E. provides 
accessories for use with the telephone, the security firm uses 
its own equipment. This equipment consists of a Super^cope 
C-202LP cassette tape recorder which is wired directly into a 
standard telephone (by-passing the handsel) at the o.^gin of the 
transmission and a Uher 4000IC recorder wired directly into the 
telephone at the terminal end of tne transmission. While 
telephone transmissions are often done on a long distance line, 
the telephone tapes were produced using a local line. The Uher 
4000IC recording was used to produce the chart. 
Procedures 

Two sets of P.S.E. charts were evaluated by each of the 
raters. One set consisted of the "Regular" 15 charts taken 
directly from the recordings of the interviews, and the second 
set of the "Telephone" charts produced by the procedures de- 
scribed above. The charts from the two sets were presented in a 
random order with the restriction that a Regular and Tele- 
phone chart from the same person could not be presented one after 
the other. The raters were inforined that the charts were from 30 
different individuals who had all responded to the same 23 ques- 
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tions. The raters were unaware of the nature of the research 
question. Stress was rated on a 5 point scale: (1) little or no 
stress indicated; (2) a small, but noticeable amount of stress is 
present; (3) a moderate amount of stress is present which is 
indicative of more than "general nervousness" ; (4 ) heavy stress , 
the question evoked a strong reaction in the subject; and (5) 
extreme stress, a virtual panic reaction. A final rating scale 
was included on the form and asked evaluators to rate the degree 
of overall stress. Two raters declined to use the overall stress 
scale and it was, therefore, dropped from any of the analyses. 

The pre-employment interviews used in this study were con- 
ducted using a control question format (Szucko 4 Kleinmuntz, 
1981). In the control question approach the individual is asked 
"relevant" questions (e.g.. Have you ever stolen cash from a 
previous employer?) and the responses are compared to the re- 
sponse from "irrelevant" questions (e.g.. Do you sometimes drive 
a car?). The "irrelevant" questions are also referred to as 
"known truth" questions; used to establish an assumed baseline of 
honest responding. Other control questions are intended to pro- 
duce stress responses and include the "known lie" and the "out- 
side issues". Comparisons between the chartij from various types 
of questions presumably allow judgments concerning the truthful- 
ness of the responses. 
Analysis 

The initial analysis was a multivariate ANOVA with the 
questions used as the dependent variables (this was suggested by 
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Saal, Downey, and Lahey, 1980). Type of chart (Regular and Tele- 
phone), raters, and ratees were the independent variables with 2, 
4, and 15 levels respr^c tively . The MANOVA allowed test for deter- 
mining mean differences between the chart types, a test of the 
significance of intraclass correlation (Ratee effect), and a test 
of the degree to which the raters produced different^ relatively 
higher or lower, levels of stress rating. If either chart type 
and/or an interaction between chart type and another variable was 
significant, the assessment of reliability from the MANOVA would 
not be meaningful and a secondary set o^ Ratee by Rater ANOVAs, 
one for each question, would be conducted within chart type. 
These ANOVAs would allow for estimating the intraclass reliabili- 
ties within chart type for each question using the appropriate 
estimate of reliability for a single rater (Shrout 4 Fleiss, 
1980; Model 3,1). If it is assumed that the responses to single 
questions are all measuring stress and that a summation over the 
items would be a more reliable measure of this stress, a new 
score could be computed for each rater. This score was produced 
by adding the stress ratings for each relevant item together for 
a rater for each of the 30 charts. Coefficient Alphas were also 
computed for each rater on each chart type using raters as the 
test and the 14 relevant questions as the items. Pearson product 
moment correlations were then computed between the summed ratings 
(for each rater) for the fifteen ratees and the two chart condi- 
tions. The resultant correlation matrix provides a 
mul ti-me thod-multi-rater look at the ratings (Lawlsr, 1967 and 
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Campbell 4 Fiske, 1959). The correlations between raters using 
the same chart type estimate the interrater reliability of the 
summed ratings using a particular type of chart. The correlation 
across chart type for the same rater shows the degree of method 
convergence. The cross method and rater correlations indicate the 
degree of convergence over both charts and raters. 

Results 

All three main effects (chart type, raters, and ratees) for 
the multivariate analysis of variance were significant and the 
chart type by ratee interaction effect was also significant. 
Table 1 gives the multivariate results and the univariate F-Tests 
for each question. Eleven of the 23 univariate tests were signif- 
icant for chart type, 14 for the raters, 22 for ratees, and 9 for 
the chart type by ratee interaction. As a general rule the ques- 
tions from the telephone charts were rated as showing less 
stress and this was true for all the questions where the differ- 
ence was significant. Racers demonstrated a moderate level of 
reliability (intraclass correlations were computed but are not 
shown) in the rank ordering of the charts for each question (when 
averaged over chart type). Fourteen Rater univariate main effects 
and the multivariate main effect were s.' mif leant . Raters dif- 
fered in their ratings of stress over all ratees and chart types. 
Given the mean differences between chart types and the signifi- 
cant chart type by ratee interactions, it was necessary to con- 
duct separate rater by ratee analyses for each question to make a 
meaningful assessment of reliability within chart type. 
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Insert T'-"'-^ 1 about here 



Table two summarizes the results of the rater by ratee 
univariate analyses for each chart type and each question. Ques- 
ticns were organized in Table 2 by question type; relevant, known 
lie and known truth. For the Regula*"^ rts, 17 (out of the 23) 
questions had sic Lficant ratee effects. All of the intraclass 
correlations were less than .51 and the majority yielded values 
less than .4. For the T^^lephone charts, 19 ratee main effects 
were significant. All of the intrrclass correlations were found 
to be less than .64 and the majority were less than .4. These 
results indicated that while there was a significant level of 
interrater reliability, the reliability estimates were quite low. 
When the ratings (summed over raters) for Regular charts were 
correlated with the summed ratings for Telephone charts for each 
question, only 4 out of the 23 resultant correlations were found 
to be significant (see Table 2). 



Insert Table 2 about here 



Table 2 demonstrates one other important finding. The dif- 
ferences between the Regular and Telephone charts were more 
prevalent for the relevant items than they were for the known 
truth and known lie items; 9 out of 14 relevant items, 1 out of 6 
known truth items, and 1 out of 3 known lie items were signifi- 



ERLC 



13 



Voice Stress Analysis - 13 
cant. In all the cases where ? sianificant riiffprpnro occjrred 
between the Regular and Telephone charts, the Regular charts neie 
rated as showing higher stress. 

As a final method of determining what was happening, the 
stress ratings for the 14 Relevant questions were summed over the 
questions for each rater. This resulted in each rater having a 
stress score for each individual on each chart type. These 8 
different kinds of summated ratings were correlated over 15 
3 atees (see Table 3). Table 3 also shows the reliability 
(coefficient alpne.) for each rater over the relevant questions 
and for each chart type. The reliability estimates suggest that, 
if a rater had rated an individual as high (or low) on a 
relevant question, then they tended to rate them high (or low) on 
all i"he other relevant questions. The circled values in Table 3 
represent cross method convergence, the relationship between the 
ratings for a single rat-^r on one chart with his racings on the 
other chart. Raters 1 and 3 both had significant correlations 
between their rating on one type of chart with their rating on 
Che oth^r type. Only rater I's correlation (.71) would be 
considered an acceptable level of convergence. When the 
correlations among the four raters were examined, both within a 
chart type and over types of charts, only the ratings from raters 
1 and 3 showed consistent significant correlations; Raters 1 and 
3 correlated r_=,62 within the Regular condition and .86 in the 
Telephone condition. Rater I's ratings in the Regular condition 
were significantly related to rater 2's ratings in the Telephone 
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coriullluii (.36) dfiu visa ^ersa (.57). Raters 2 and 4 showed some 
convergence with rater 3 in the Regular and Telep..one conditions 
respectively but there was no other convergence. 



Insert Table 3 about here 



Discussion 

The research questions asked in this study can, within a 
limited context, be answered. There is evidence for only a 
limited degree of interrater reliability for questions from 
either Regular or Telephone charts. Further, the data 
demonstrated that the use of Telephone charts led to lower 
stress ratings and the telephone ratings were not correlated with 
the ratings from a Regular chart. These findings would argue 
strongly for discontinuing the use of telephone reproduced tapes 
as a substitute for regular charts since both mean differences in 
charts and a lack of convergence was found. Given that the charts 
used in this study were from actual employment interviews, tney 
are not subject to concerns about their ecological validity 
(3ell, 1981 and Heisse, 1976). 

The analysis of the reliability of single questions from 
both types of charts was less than encouraging, "he intraclass 
correlations, while generally significant, were low and averaged 
less than .40. Little ,if any, difference in reliability occurred 
over the type of question. These estimates are consistent with 
the .38 value reported by Horvath (1978). Basing important 
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personnel decisions upon rating where there is so little 
consensus between two different raters is, to say the least, 
questionable . 

The additional analyses, done with only the Relevant items, 
offered some hope for improving the reliability of voice analysis 
ratings. Using the traditional method of summing over items 
responses (Edwards, 1957), it was apparent that each of the 
raters were acting in a very consistent fashion for a ratee over 
items. Further, for raters 1 and 3, the ratings were consistent 
over the chart conditions as well as within the chart condition. 
Raters 1 and 3 were the most experienced and had the best train- 
ing of the 4 raters. These significant correlations between the 
Regular and Telephone charts (summated ratings) for raters 1 and 
3 support the view that some consistent factor in the charts was 
being observed. However the mean differences for the summated 
ratings (observed in all four raters) between the Regular and 
Telephone charts would lead to lower levels of stress (lying) 
being attributed to the sare individual, depending on the source 
of the charts. Under the best of conditions (assuming the raters 
are well trained and they a^e in fact rating stress which is 
related to lying)) some adjustment would be required in the mean 
stress levels for charts from telephone tapes. Also, this conclu- 
sion would only hold under the situation where stress is eval- 
uated on a series of questions and than added (averaged) over 
these questions, a procedure which is NOT currently being used. 
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The results of this study do not provide any direct evidence 
as to what raters are evaluating. There is, however, some indi- 
rect evidence to indicate that the information being evaluated is 
affected by the content of the question. The mean differences 
between the Regular and Telephone charts were more often signifi- 
cant for the Relevant items (64%) versus the other items (22%)- 
This suggests that whatever the evaluators were rating, when they 
were dealing with relevant stress items their ratings were 
affected by the chart type. If evaluators were rating an individ- 
ual characteristic unrelated to stress, e.g., voice quality, it 
would be expected that similar mean differences would be found in 
both the relevant and the non-relevant questions. 

There is little evidence in this study to support the value 
of voice analysis, as it is currently being used, as a technique 
for identification of lying, a view shared by Sackett and Decker 
(1979, p501). The average reliability (interrater) of single 
raters on single questions was too low to justify the continued 
use of stress ratings for individual selection purposes. Further, 
the continued use of tapes transmitted by telephone as a substi- 
tute for regular charts, has to be severely questioned given 
general lack of correlation with regular charts and the lower 
mean scores given to the Telephone charts. The results obtained 
from using summated rating over questions offers some future 
possibilities and offers a systematic alternative to the current 
practice of asking evaluators to provide a new summary judgment 
concerning the overall level of stress. If the current methodolo- 
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gy for evaluating the charts is modified, it results in improve- 
ments in the reliability estimates. The individual differences 
between raters in the interrater reliability estimates needs 
further investigation and if training is as important as these 
results suggest, then it is imperative that more time, effort, 
and resources go into the development and evaluation of training 
programs, 

T*ie use of voice stress analysis techniques and the public's 
belief in its value as a method for detecting deception has grown 
over the last few years. As pointed out by Kleinmuntz and Szucko 
(1984) in their discussion of polygraphic evidence, positive 
beliefs by the public and supportive pronouncements by 
proponents/users tend to overwhelm any scientific evidence to the 
contrary. Since voice analysis results are being used to deter- 
mi le the employability of individuals, and since work is, for 
most individuals, a major factor in their physical, social, and 
personal wellbeing, it is mandatory to insure that the relia- 
bility and validity of a selection technique is sufficient to 
warrant its continued use. If voice stress analysis is (as is 
suggested here) found wanting in reliability, then its use should 
be discontinued until evidence can be supplied as to its value. 
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Table 1 - F-Values for the Chart by Rater by Ratee ANOVA 
for Each Question 



F- VALUES 
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Degrees of freedom: C= 


1, RR=3, 


RE=14, C 


X Rf =3, C 
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RE=42. and ERROR= 
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Table 2 - Means, Standard Deviations, and ICCs by Item for Regular (R) 
and Telephone (T) Charts with the Correlations Between 
Conditions (R 4 T) - Ratings Aveif.ged over Raters 
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2.00 


0.67 


0.28 


0.26 


T 2.25 


0.76 


0.40 


-0. 25 


13 


R 


2.25 


0.56 


0.13 




21 R 2.33 


0.40 


5 

-0.08 






T 


2.23 


0.59 


0.09 


0.13 


T 2.37 


0.84 


0.43 


0.12 












OSI & KL 


ITEMS (LO; 
















2 








2 




3 


R 


2.30 


0.67 


0.39 




7 R 2.53 


0.77 


0.45 


3 




T 


2.45 


1.01 


0.64 


0.02 


T 2.46 


0.66 


0.20 


0.52 






1 




2 












14 


R 


2.65 


0.83 
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2.25 
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1 

Means were significantly different (p<.05) in the condition by rater 
by ratee ANOVA. 

2 

^Ratee effects (R 4 T) were significant (p<.05) - rater by ratee ANOVA 
The correlation between R and T ratings was significant (p<.05). 

4 

Same as footnote 2 but only the R ICC was significant. 

5 

^ Same as footnote 2 but only the T ICC was significant. 
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Table 3 - Single Rater Reliabilities over the 'Relevant' Items with Means 
and Standard Deviations and Intercorrelations between Raters 



RATER 



MEAN S.O. 



1-R 



2-R 



3-R 



RATER 
4-R 1-T 



2-T 



3-T 



4-T 



1- REG 42.93 13.34 

2- REG 41.53 13.07 

3- REG 39.27 4.91 

4- REG 32.33 8.67 

3 

1- TEL 34.07 14.46 

2- TEL 39.00 1A.34 

3- TEL 34.20 9.31 

4- TEL 25.33 8.97 



(0.93) 

G.21 (0.96) 
♦ » 

0.62 0.46 (0.60) 

0.29 -0.03 0.33 (0.89) 

0.71 0.33 0.57 0.42 (0.96) 

0.25 -0.24 -0.11 0.22 0.28 (0.96) 
* * ** 

0.56 0.28 0.44 0.30 0.86 0.26 (0.92) 

0.25 0.23 G.27 -0.31 0.40 0.13 0.50 (0.90) 



REG = ratings of the regular charts. 

2 

^Values in () are ICCs for a single rater over the 'relevant' items 

TEL = ratings of the telephone charts. 
» ** 

p < .05: p < .01 
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